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Gene Expression Profiles in Normal and Cancer Cells 

This invention was made with support from the National Institutes of 
Health, Grant No. GM07309, CA57345, and CA62924. The U.S. government 
therefore retains certain rights in the invention. 



TF fTTNTr AT, TT ETJ) OF THE INVENTION 

This invention is related to the diagnosis of cancer, and tools for 

carrying out such diagnosis. 

H AfTTCTOTIN ll OF THE INVENTION 

Much of cancer research over the past 50 years has been devoted to the 
analyses of genes that are expressed differently in tumor cells compared to their 
normal counterparts. Although hundreds of studies have pointed out 
differences in the expression of one or a few genes, no comprehensive study of 
gene expression in the cancer cell has been reported. It is therefore not known 
how many genes are expressed differentiaUy in tumor versus normal cells, 
whether the bulk of these differences are cell autonomous rather than being 
dependent on the tumor microenvironment, and whether most differences are 
cell-type specific or tumor specific. Thus there is a need in the art for 
information on the molecular changes that occur in cells during cancer 
development and progression. 



^TMMARY Q f TP* TNVENTION 

According to one embodiment of the invention, a method is provided 
for diagnosing colon cancer in a sample suspected of being neoplastic. The 

method comprises the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a colonic tissue 
suspected of being neoplastic and the second sample is of a normal human 
colonic tissue, and wherein the trai^pt is identified by a tag selected from the 
group consisting of those shown in Table 3; 

identifying the first sample as neoplastic when the level of the 
at least one transcript is found to be lower in the first sample than in the second 
sample. 

According to another embodiment of the invention, another method is 
provided for diagnosing colon cancer in a sample suspected of being neoplastic. 
The method comprises the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a colonic tissue 
suspected of being neoplastic and the second sample is of a normal human 
colonic tissue, and wherein the transcript is identified by a tag selected from the 
group consisting of those shown in Table 2; 

identifying the first sample as neoplastic when the level of the 
at least one transcript is found to be higher in the first sample than in the 
second sample. 

In another embodiment of the invention an isolated and purified human 
nucleic acid molecule is provided. The molecule comprises a SAGE tag 
selected from SEQ ID NO: 1-732. 

In yet another aspect of the invention an isolated nucleotide probe is 
provided. The probe comprises at least 12 nucleotides of a human nucleic acid 
molecule, wherein the human nucleic acid molecule comprises a SAGE tag 
selected from SEQ ID NO: 1-732. 



According to another aspect of the invention a method is provided for 
diagnosing pancreatic cancer in a sample suspected of being neoplastic. The 
method comprises the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a pancreatic tissue 
suspected of being neoplastic and the second sample is of a normal human 
colon tissue, wherein said transcript is identified by a tag selected from the 
group consisting of those shown Table 4; 

identifying the first sample as neoplastic when the level of the 
at least one transcript is found to be higher in the first sample than in the 
second sample. 

According to still another embodiment of the invention a method of 
diagnosing cancer in a sample suspected of being neoplastic is provided. The 
method comprises the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a tissue suspected of 
being neoplastic and the second sample is of a normal human tissue, wherein 
said transcript is identified by a tag selected from the group consisting of those 
shown Table 5; 

identifying the first sample as neoplastic when the level of the 
at least one transcript is found to be higher in the first sample than in the 
second sample. 

According to another embodiment of the invention a method is 
provided to aid in the determination of a prognosis for a colon cancer patient. 
The method comprises the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a neoplastic colonic 
tissue and the second sample is of a normal human colonic tissue, and wherein 
the transcript is identified by a tag selected from the group consisting of those 
shown in Table 3; 



determining a poorer prognosis if the level of the at least one 
transcript is found to be lower in the first sample than in the second sample. 

According to another aspect of the invention a method to aid in 
determining a prognosis for a patient with colon cancer is provided. The 

method comprises the steps of: 

comparing the level of at least one transcript in a first tissue 
sample to a second sample, wherein the first sample is of a colonic cancer 
tissue and the second sample is of a normal human colonic tissue, and wherein 
the transcript is identified by a tag selected from the group consisting of those 

shown in Table 2; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be higher in the first sample than in the second sample. 

In yet another embodiment of the invention a method is provided for 
diagnosing colon cancer in a sample suspected of being neoplastic. The 
method comprises the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
colonic tissue suspected of being neoplastic and the second sample is of a 
normal human colonic tissue, and wherein the protein is encoded by a transcript 
identified by a tag selected from the group consisting of those shown in Table 
3; 

identifying the first sample as neoplastic when the level of 
expression of the protein is found to be lower in the first sample than in the 
second sample. 

In another aspect of the invention a method of diagnosing colon cancer 
in a sample suspected of being neoplastic is provided. The method comprises 
the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
colonic tissue suspected of being neoplastic and the second sample is of a 
normal human colonic tissue, and wherein the protein is encoded by a transcript 
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identified by a tag selected from the group consisting of those shown in Table 
2; 

identifying the first sample as neoplastic when expression of the 
protein is found to be higher in the first sample than in the second sample. 

According to another embodiment of the invention a method is 
provided to aid in determining a prognosis of a patient having pancreatic 
cancer. The method comprises the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a neoplastic 
pancreatic tissue and the second sample is of a normal human colon tissue, 
wherein said transcript is identified by a tag selected from the group consisting 
of those shown Table 4; 

detennining a poorer prognosis if transcription is found to be 
higher in the first sample than in the second sample. 

In yet another aspect of the invention a method to aid in providing a 
prognosis for a cancer patient is provided. The method comprises the steps of: 
comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a neoplastic tissue 
and the second sample is of a normal human tissue of the same tissue type, 
wherein said transcript is identified by a tag selected from the group consisting 

of those shown Table 5; 

determining a poorer prognosis if transcription is found to be 

higher in the first sample than in the second sample. 

According to still another aspect of the invention, a method is provided 
for diagnosing pancreatic cancer in a sample suspected of being neoplastic. 
The method comprises the steps of: 

comparing the level of expression of at least one protein 
encoded by a transcript in a first sample of a tissue to a second sample, wherein 
the first sample is of a pancreatic tissue suspected of being neoplastic and the 
second sample is of a normal human colon tissue, wherein said protein is 



encoded by a transcript identified by a tag selected from the group consisting 

of those shown Table 4; 

identifying the first sample as neoplastic when expression of the 
protein is found to be higher in the first sample than in the second sample. 

According to yet another aspect of the invention a method is provided 
for diagnosing cancer in a sample suspected of being neoplastic. The method 

comprises the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
tissue suspected of being neoplastic and the second sample is of a normal 
human tissue, wherein said protein is encoded by a transcript identified by a tag 
selected from the group consisting of those shown Table 5; 

identifying the first sample as neoplastic when expression of the 
protein is found to be higher in the first sample than in the second sample. 

In still another embodiment of the invention a method is provided to aid 
in the determination of a prognosis of a colon cancer patient. The method 
comprises the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
neoplastic colonic tissue and the second sample is of a normal human colonic 
tissue, and wherein the protein is encoded by a transcript identified by a tag 
selected from the group consisting of those shown in Table 3; 

determining a poorer prognosis if the level of expression is 
found to be lower in the first sample than in the second sample. 

In still another embodiment of the invention a method is provided to aid 
in detennining a prognosis for a patient with colon cancer. The method 
comprises the steps of: 

comparing the level of expression of at least one protein in a 
first tissue sample to a second sample, wherein the first sample is of a colonic 
cancer tissue and the second sample is of a normal human colonic tissue, and 



wherein the protein is encoded by a transcript identified by a tag selected from 
the group consisting of those shown in Table 2; 

detennining a poorer prognosis if the level of expression is 
found to be higher in the first sample than in the second sample. 

In still another aspect of the invention a method is provided to aid in 
determining a prognosis of a patient having pancreatic cancer. The method 

comprises the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
neoplastic pancreatic tissue and the second sample is of a normal human colon 
tissue, wherein said protein is encoded by a transcript identified by a tag 
selected from the group consisting of those shown Table 4; 

deterniining a poorer prognosis if the level of expression is 
found to be higher in the first sample than in the second sample. 

According to even a further aspect of the invention a method is 
provided to aid in providing a prognosis for a cancer patient. The method 

comprises the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
neoplastic tissue and the second sample is of a normal human tissue of the same 
tissue type, wherein said protein is encoded by a transcript identified by a teg 
selected from the group consisting of those shown Table 5; 

deterniining a poorer prognosis if the level of expression is 
found to be higher in the first sample than in the second sample. 

In still another embodiment of the invention a method of treating a 
cancer cell is provided. The method comprises the step of: 

administering to a cancer cell an antibody which specifically 
binds to a protein encoded by a transcript identified by a tag selected from the 
group consisting of those shown in Tables 2, 4, and 5, wherein the antibody is 
linked to a cytotoxic agent. 
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In another aspect of the invention an antibody linked to a cytotoxic 
agent is provided. The antibody specifically binds to a protein encoded by a 
transcript identified by a tag selected from the group consisting of those shown 

in Tables 2, 4, and 5. 

According to another aspect of the invention, a method of detecting 
colon cancer in a patient is provided. The method comprises the steps of: 

comparing the level of at least one protein or transcript in a first 
body sample to a second body sample, wherein the first sample is a body 
sample of the patient and the second sample is of a normal human, wherein the 
protein is encoded by a transcript and the transcript is identified by a tag 
selected from the group consisting of those shown in Table 2, wherein the first 
and second body sample is a sample selected from the group consisting of 
blood, urine, feces, sputum, and serum; 

identifying neoplasia when the level of the at least one protein 
or transcript is found to be higher in the first sample than in the second sample. 

In another aspect of the invention a method of detecting pancreatic 
cancer in a patient is provided. The method comprises the steps of: 

comparing the level of at least one protein or transcript encoded 
by a transcript in a first sample of a tissue to a second sample, wherein the first 
sample is of the patient and the second sample is of a normal human, wherein 
said protein is encoded by a transcript and the transcript is identified by a tag 
selected from the group consisting of those shown Table 4, wherein the first 
and second sample is a sample selected from the group consisting of blood, 

urine, feces, sputum, and serum; 

identifying neoplasia when the level of the at least one protein 
or transcript is found to be higher in the first sample than in the second sample. 

Also provided by the present invention is a method of detecting cancer 
in a patient. The method comprises the steps of: 

comparing the level of at least one protein or transcript in a first 
sample to a second sample, wherein the first sample is of patient and the 
second sample is of a normal human, wherein said protein is encoded by a 



transcript and the transcript is identified by a tag selected from the group 
consisting of those shown Table 5, wherein the first and second body sample 
is a sample selected from the group consisting of blood, urine, feces, sputum, 
and serum; 

identifying neoplasia when the level of the at least one protein 
or transcript is found to be higher in the first sample than in the second sample. 

Additionally provided by the present invention is a method to aid in the 
determination of a prognosis for a colon cancer patient The method comprises 
the steps of: 

comparing the level of at least one protein or transcript in a first 
sample to a second sample, wherein the first sample is of a colon cancer patient 
and the second sample is of a normal human, wherein the protein is encoded 
by a transcript and the transcript is identified by a tag selected from the group 
consisting of those shown in Table 3, wherein the first and second body sample 
is a sample selected from the group consisting of blood, urine, feces, sputum, 
and serum; 

determining a poorer prognosis if the level of the at least one 
protein or transcript is found to be lower in the first sample than in the second 
sample. 

Provided by another embodiment of the invention is a method to aid 
in detennining a prognosis for a patient with colon cancer. The method 

comprises the steps of: 

comparing the level of at least one protein or transcript in a first 
sample to a second sample, wherein the first sample is of a colonic cancer 
patient and the second sample is of a normal human, wherein the protein is 
encoded by a transcript and the transcript is identified by a tag selected from 
the group consisting of those shown in Table 2, wherein the first and second 
sample is a sample selected from the group consisting of blood, urine, feces, 
sputum, and serum; 
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determining a poorer prognosis if the level of the at least one 
protein or transcript is found to be higher in the first sample than in the second 
sample. 

According to still another aspect of the invention, a method to aid in 
deterrnining a prognosis of a patient having pancreatic cancer is provided. The 
method comprises the steps of: 

comparing the level of at least one protein or transcript in a first 
sample to a second sample, wherein the first sample is of a pancreatic cancer 
patient and the second sample is of a normal human, wherein said protein is 
encoded by a transcript and the transcript is identified by a tag selected from 
the group consisting of those shown Table 4, wherein said first and second 
sample is a sample selected from the group consisting of blood, urine, feces, 

sputum, and serum; 

determining a poorer prognosis if the level of the at least one 
protein or transcript is found to be higher in the first sample than in the second 
sample. 

Also provided by the present invention is a method to aid in providing 
a prognosis for a cancer patient. The method comprises the steps of: 

comparing the level of expression of at least one protein or 
transcript in a first sample to a second sample, wherein the first sample is of a 
cancer patient and the second sample is of a normal human, wherein said 
protein is encoded by a transcript and the transcript is identified by a tag 
selected from the group consisting of those shown Table 5, wherein the first 
and second sample is a sample selected from the group consisting of blood, 

urine, feces, sputum, and serum; 

deterrnining a poorer prognosis if the level of the at least one 
protein or transcript is found to be higher in the first sample than in the second 
sample. 

The present invention farther includes antisense oligonucleotides 
complementary in whole or in part to SEQ ID NOS: 1-732. 
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This invention also provides a method for screening for candidate 
agents that modulate the expression of a porynuleotide selected from the group 
consisting of the polynucleotides in SEQ ID NOS. 1-732 or their respective 
complements, by contacting a test agent with a pancreatic or colon cell and 
monitoring expression of the polynucleotide, wherein the test agent which 
modifies the expression of the polynucleotide is a candidate agent. 

The present invention provides the art with new methods and reagents 
for diagnosing and prognosing cancers. In addition, some of the newly 
disclosed genes may play an important role in the development of cancers. 
BRTFF INSCRIPTION OF THE DRAWINGS 
Fig. 1. Comparison of expression patterns in colorectal cancers and normal 
colon epithelium. (FIG. 1A) A semi-logarithmic plot reveals 51 tags that 
were decreased more than 10 fold in primary CR cancer cells whereas 32 tags 
were increased more than 10 fold. 62,168 and 60,878 tags derived from 
normal colon epithelium and primary CR cancers, respectively, were used for 
this analysis. The relative expression of each transcript was determined by 
dividing the number of tags observed in tumor and normal tissue as indicated. 
To avoid division by 0, a tag value of 1 was used for any tag that was not 
detectable in one of the samples. These ratios were then rounded to the 
nearest integer and their distribution plotted on the abscissa. The number of 
genes displaying each ratio was plotted on the ordinate. Tu: CR tumors; NC: 
Normal colon. (FIG. IB and FIG. 1C) Differentially expressed genes in 
colorectal cancers. The number of transcripts found to be differentially 
expressed (P < 0.01) are presented as Venn diagrams. Diagrams of transcripts 
that were decreased (FIG. IB) or increased (FIG. 1C) in CR cancers 
compared to normal colon epithelium. Comparisons were between primary 
tumors and cells in culture as indicated. 

Fig. 2. Northern blot analysis of genes differentially expressed in 
gastrointestinal neoplasia. Northern blot analysis was performed on total RNA 
(5 ug isolated from primary CR carcinomas (T) and matching normal colon 
epithelium (N), or pancreatic carcinomas. The top panel in each case show an 
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example of the ethidium bromide stained gels prior to transfer. The number of 
SAGE tags observed in the original analysis is indicated to the right of each 
blot. (FIG. 2A) Examples of transcripts that were decreased or increased in 
CR cancers. (FIG.2B) Examples of transcripts increased in pancreatic cancers 
(70). (HG.2C) Examples of transcripts elevated in cancer which were or 
were not cancer type specific. Probes used for Northern blot analysis were as 
follows (Human SAGE Tag unique identifier, gene name, (GenBank accession 
number)): (FIG. 2A) H204104, Guanyiin (M95714); H259108, (see Table 2); 
H1000193, (see Table 2); H998030, (see Table 2). (FIG. 2B) H294155, 
RIG-E (U42376); H560056, T1MP-1 (S68252). (FIG. 2C) H802810, 
EST338411 (W52120); H85882, 1-8D (X57351); H618841, GA733-1 
(X13425). 

Tables 2-5. Transcripts DifFerentially Expressed in Human Cancer. 

Tag sequence represents the Nlain site plus the adjacent 11 bp SAGE tag. 

Tag number indicates a SAGE UK) (unique identifier). NC, TU, CL, PT, PC, 

refers to the number of the indicated tag observed in RNA isolated from 

normal colorectal epithelium, primary colorectal cancers, colorectal cancer cell 

lines, primary pancreatic cancers, or pancreatic cancer cell lines, respectively. 

The Accession and Gene Name refer to representative GenBank entries that 

contain the tag sequence. 

Table 2 Transcripts increased in colorectal cancer. 

Table 3 Transcripts decreased in colorectal cancer. 

Table 4 Transcripts increased in pancreatic cancer. 

Table 5 Transcripts increased in pancreatic and colorectal cancer. 

TW!TAn,Hl DESCRIPTION 

The inventors have discovered sets of human genes which are either 
upregulated or downregulated in cancer cells, as compared to normal cells. 
Specifically, certain genes have been found to be upregulated or downregulated 
in colorectal and/or pancreatic cancer cells, when compared to normal colon 
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cells. These sets of differentially regulated genes can be used as diagnostic 
markers, either individually or in sets of, for example, 2, 5, 10, 20, or 30. 

Genes whose expression was detected to be increased in colorectal 
cancer are shown in Table 2. Genes whose expression was detected to be 
decreased in colorectal cancer are shown in Table 3 Genes whose expression 
was detected as increased in pancreatic cancer are shown in Table 4. Genes 
whose expression was detected as increased in both pancreatic cancer and 
colorectal cancer are shown in Table 5. These latter genes likely play a role in 
neoplastic development generally. 

Tag sequences, as provided herein, uniquely identify genes. This is due 
to their length, and their specific location (3«) in a gene from which they are 
drawn. The foil length genes can be identified by matching the tag to a gene 
data base member, or by using the tag sequences as probes to physically isolate 
previously unidentified genes from cDNA libraries. The methods by which 
genes are isolated from libraries using DNA probes are well known in the art. 
See, for example, Veculescu et al., ScisilCS 270: 484 (1995), and Sambrook et 
al. (1989), MOLECULAR CLONING: A LABORATORY MANUAL, 2nd 
ed. (Cold Spring HarborPress, Cold Spring Harbor, New York). Once a gene 
or transcript has been identified, either by matching to a data base entry, or by 
physically hybridizing to a cDNA molecule, the position of the hybridizing or 
matching region in the transcript can be determined. If the tag sequence is not 
in the 3' end, immediately adjacent to the restriction enzyme used to generate 
the SAGE tags, then a spurious match may have been made. Confirmation of 
the identity of a SAGE tag can be made by comparing transcription levels of 
the tag to that of the identified gene in certain cell types. 

In addition to the sequences shown in SEQ ID NOS: 1-732, or their 
complements, this invention also provides the anti-sense polynucleotide stand, 
e.g. antisenseRNAto these sequences or their complements. One can obtain 
an antisense RNA using the sequences provided in SEQ ID NOS: 1-732 and 
the methodology described in Vander Krol et al. (1988) BioTechniques 6:958. 
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The invention also encompasses polynucleotides which differ from that 
of the polynucleotides described above, but which produce the same 
phenotypic effect, such as the allele. These altered, but phenotypically 
equivalent polynucleotides are referred to "equivalent nucleic acids." This 
invention also encompasses polynucleotides characterized by changes in 
non-coding regions that do not alter the phenotype of the polypeptide 
produced therefrom when compared to the polynucleotide herein. This 
invention further encompasses polynucleotides, which hybridize to the 
polynucleotides of the subject invention under conditions of moderate or high 
stringency. 

The polynucleotides can be conjugated to a detectable marker, e.g., an 
enzymatic label or a radioisotope for detection of nucleic acid and/or 
expression of the gene in a cell. A wide variety of appropriate detectable 
markers are known in the art, including fluorescent, radioactive, enzymatic or 
other ligands, such as avidin/biotin, which are capable of giving a detectable 
signal In preferred embodiments, one will likely desire to employ a fluorescent 
label or an enzyme tag, such as urease, alkaline phosphatase or peroxidase, 
instead of radioactive or other environmental undesirable reagents. In the case 
of enzyme tags, colorimetric indicator substrates are known which can be 
employed to provide a means visible to the human eye or 
spectrophotometricaUy, to identify specific hybridization with complementary 
nucleic add-containing samples. Briefly, this invention further provides a 
method for detecting a single-stranded polynucleotide identified by SEQ ID 
NOS. 1-732 or its complement, by contacting target single-stranded 
polynucleotides with a labeled, single-stranded polynucleotide (a probe) which 
is at least 10 nucleotides of the complement of SEQ ID NOS: 1-732 (or the 
corresponding complement) under conditions permitting hybridization 
(preferably moderately stringent hybridization conditions) of complementary 
single-stranded polynucleotides, or more preferably, under highly stringent 
hybridization conditions. Hybridized polynucleotide pairs are separated from 
un-hybridized, single-stranded polynucleotides. The hybridized polynucleotide 
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pairs are detected using methods well known to those of skill in the art and set 
forth, for example, in Sambrook et al. (1989) supra. 

The polynucleotides of this invention can be isolated using the 
technique described in the experimental section or replicated using PCR. The 
PCR technology is the subject matter of United States Patent Nos.4,683, 195, 
4,800,159, 4,754,065, and 4,683,202 and described in PCR: The Polymerase 
Chain Reaction (Mullis et aL eds, Birkhauser Press, Boston (1994)) or 
MacPherson et al. (1991) and (1994), supra, and references cited therein. 
Alternatively, one of skill in the art can use the sequences provided herein and 
a commercial DNA synthesizer to replicate the DNA. Accordingly, this 
invention also provides a process for obtaining the polynucleotides of this 
invention by providing the linear sequence of the polynucleotide, nucleotides, 
appropriate primer molecules, chemicals such as enzymes and instructions for 
their replication and chemically replicating or linking the nucleotides in the 
proper orientation to obtain the polynucleotides. In a separate embodiment, 
these polynucleotides are further isolated. Still further, one of skill in the art 
can insert the polynucleotide into a suitable replication vector and insert the 
vector into a suitable host cell (procaryotic or eucaryotic) for replication and 
amplification. The DNA so amplified can be isolated from the cell by methods 
well known to those of skill in the art. A process for obtaining polynucleotides 
by this method is further provided herein as well as the polynucleotides so 
obtained. 

RNA can be obtained by first inserting a DNA polynucleotide into a 
suitable host cell. The DNA can be inserted by any appropriate method, e.g., 
bymeuseofanappropriategenedeUveryvectororbyelectroporation. When 

the cell replicates and the DNA is transcribed into RNA; the RNA can then be 
isolated using methods well known to those of skill in the art, for example, as 
set forth in Sambrook et al. (1989) supra. For instance, mRNA can be isolated 
using various lytic enzymes or chemical solutions according to the procedures 
set forth in Sambrook etaL (1989), supra or extracted by nucleic-acid-binding 
resins following the accompanying instructions provided by manufactures. 
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Polynucleotides having at least 10 nucleotides and exhibiting sequence 
complementarity or homology to SEQ ID NOS: 1-732 find utility as 
hybridization probes. In some aspects, the fall coding sequence of the 
transcript, Le., for SEQ ID NOS: 1-732, are known. Accordingly, any portion 
of the known sequences available in GenBank, or homologous sequences, can 
be used in the methods of this invention. 

It is known in the art that a "perfectly matched" probe is not needed for 
a specific hybridization. Minor changes in probe sequence achieved by 
substitution, deletion or insertion of a small number of bases do not affect the 
hybridization specificity. Li general, as much as 20% base-pair mismatch 
(when optimally aligned) can be tolerated. Preferably, a probe useful for 
detecting the aforementioned mRNA is at least about 80% identical to the 
homologous region of comparable size contained in the previously identified 
sequences identified by SEQ ID NOS: 1-732, which correspond to previously 
characterized genes or SEQ ID NOS: 1-732, which correspond to known 
ESTs. More preferably, the probe is 85% identical to the corresponding gene 
sequence after alignment of the homologous region; even more preferably, it 

exhibits 90% identity. 

These probes can be used in radioassays (e.g. Southern and Northern 
blot analysis) to detect, prognose, diagnose or monitor various pancreatic or 
colon cells or tissue containing these cells. The probes also can be attached to 
a solid support or an array such as a chip for use in high throughput screening 
assays for the detection of expression of the gene corresponding to one or 
more polynucleotides) of this invention. Accordingly, this invention also 
provides at least one of the transcripts identified as SEQ ID NOS: 1-732, or its 
complement, attached to a solid support for use in high throughput screens. 

The total size of fragment, as well as the size of the complementary 
stretches, will depend on the intended use or application of the particular 
nucleic acid segment. Smaller fragments will generally find use in hybridization 
embodiments, wherein the length of the complementary region may be varied, 
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such as between about 10 and about 100 nucleotides, or even full length 
according to the complementary sequences one wishes to detect. 

Nucleotide probes having complementary sequences over stretches 
greater than 10 nucleotides in length are generally preferred, so as to increase 
stability and selectivity of the hybrid, and thereby improving the specificity of 
particular hybrid molecules obtained. More preferably, one can design 
polynucleotides having gene-complementary stretches of more than 50 
nucleotides in length, or even longer where desired. Such fragments may be 
readily prepared by, for example, directly synthesizing the fragment by 
chemical means, by application of nucleic acid reproduction technology, such 
as the PCR technology with two priming oligonucleotides as described in U.S. 
Pat. No. 4,603,102 or by introducing selected sequences into recombinant 
vectors for recombinant production. A preferred probe is about 50-75 or more 
preferably, 50-100, nucleotides in length. 

The polynucleotides of the present invention can serve as primers for 
the detection of genes or gene transcripts that are expressed in pancreatic or 
colon cells. In this context, amplification means any method employing a 
primer-dependent polymerase capable of replicating a target sequence with 
reasonable fidelity. Amplification may be carried out by natural or recombinant 
DNA-polymerases such as T7 DNA polymerase, Klenow fragment of E.coli 
DNA polymerase, and reverse transcriptase. 

A preferred amplification method is PCR. However, PCR conditions 
used for each reaction are empirically determined. A number of parameters 
influence the success of a reaction. Among them are annealing temperature 
and time, extension time, Mg 2 * ATP concentration, pH, and the relative 
concentration of primers, templates, and deoxyribonucleotides. After 
amplification, the resulting DNA fragments can be detected by agarose gel 
electrophoresis followed by visualization with ethidium bromide staining and 
ultraviolet illumination. 

The invention further provides the isolated polynucleotide operatively 
linked to a promoter of RNA transcription, as well as other regulatory 
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sequences for replication and/or transient or stable expression of the DNA or 
KNA As used herein, the term "operatively linked" means positioned in such 
a manner that the promoter will direct transcription of RNA off the DNA 
molecule. Examples of such promoters are SP6, T4 and T7. In certain 
embodiments, cell-specific promoters are used for cell-specific expression of 
the inserted polynucleotide. Vectors which contain a promoter or a 
promoter/enhancer, whh termination codons and selectable marker sequences, 
as well as a cloning site into which an inserted piece of DNA can be operatively 
finked to that promoter are well known in the art and commercially available. 
For general methodology and cloning strategies, see Gene Expression 
Technology (Goeddel ed., Academic Press, Inc. (1991)) and references cited 
therein and Vectors: Essential Data Series (Gacesa and Ramji, eds., John Wiley 
& Sons, N.Y. (1994)), which contains maps, functional properties, commercial 
suppliers and a reference to GeriEMBL accession numbers for various suitable 
vectors. Preferable, these vectors are capable of transcribing KNA in vitro or 
in vivo. 

Fragment of the sequences shown in SEQ ID NOS:l-732 or their 
respective complements also are encompassed by this invention, preferably at 
least 10 nucleotides and more preferably having at least 18 nucleotides. Larger 
polynucleotides, e.g., cDNA or genomic DNA, which hybridize under 
moderate or stringent conditions to the polynucleotide sequences shown in 
SEQ ID NOS: 1-732, or their respective complements, also are encompassed 
by this invention. 

In one embodiment, these fragments are polynucleotides that encode 
polypeptides or proteins having diagnostic and therapeutic utilities as described 
herein as well as probes to identify transcripts of the protein which may or may 
not be present These nucleic acid fragments can by prepared, for example, by 
restriction enzyme digestion of the polynucleotide of SEQ ID NOS . 1-732, or 
their complements, and then labeled with a detectable marker. Alternatively, 
random fragments can be generated using nick translation of the molecule. For 
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methodology for the preparation and labeling of such fragments, see Sambrook 

et al., (1989) supra. 

Expression vectors containing these nucleic acids are useful to obtain 
host vector systems to produce proteins and polypeptides. It is implied that 
these expression vectors must be replicable in the host organisms either as 
episomes or as an integral part of the chromosomal DNA. Suitable expression 
vectors include viral vectors, including adenoviruses, adeno-associated viruses, 
retroviruses, cosmids, etc. Adenoviral vectors are particularly useful for 
introducing genes into tissues in vivo because of their high levels of expression 
and efficient transformation of cells both in vitro and in vivo. When a nucleic 
acid is inserted into a suitable host cell, e.g., a procaryotic or a eucaryotic cell 
and the host cell replicates, the protein can be recombinantly produced. 
Suitable host cells will depend on the vector and can include mammalian cells, 
animal cells, human cells, simian cells, insect cells, yeast cells, and bacterial 
cells constructed using well known methods. See Sambrook et al. (1989) 
supra. In addition to the use of viral vector for insertion of exogenous nucleic 
acid into cells, the nucleic acid can be inserted into the host cell by methods 
well known in the art such as transformation for bacterial cells; transection 
using calcium phosphate precipitation for mammalian cells; or DEAE-dextran; 
electroporation; or microinjection. See Sambrook et al. (1989) supra for this 
methodology. Thus, this invention also provides a host cell, e.g. a mammalian 
cell, an animal cell (rat or mouse), a human cell, or a procaryotic cell such as 
abacterial cell, containing a polynucleotide encoding a protein or polypeptide 
or antibody. 

When the vectors are used for gene therapy in vivo or ex vivo, a 
pharmaceutically acceptable vector is preferred, such as a 
replication-incompetent retroviral or adenoviral vector. Pharmaceutically 
acceptable vectors containing the nucleic acids of this invention can be further 
modified for transient or stable expression of the inserted polynucleotide. As 
used herein, the term "pharmaceutically acceptable vector" includes, but is not 
limited to, a vector or delivery vehicle having the ability to selectively target 
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and introduce the nucleic acid into dividing cells. An example of such a vector 
is a "replication-incompetent" vector defined by its inability to produce viral 
proteins, precluding spread of the vector in the infected host cell. An example 
of a replication-incompetent retroviral vector is LNL6 (Miller, AJD. et al. 
(1989) BioTechniques 7:980-990). The methodology of using 
replication-incompetent retroviruses for retroviral-mediated gene transfer of 
gene markers is well established (Correll et al. (1989) PNAS USA 86:8912; 
Bordignon (1989) PNAS USA 86:8912-52; Culver, K. (1991) PNAS USA 
88:3155; and Rill, D.R. (1991) Blood 79(10):2694-700. Clinical investigations 
have shown that there are few or no adverse effects associated with the viral 
vectors, see Anderson (1992) Science 256:808-13. 

Compositions containing the polynucleotides of this invention, in 
isolated form or contained within a vector or host cell are further provided 
herein. When these compositions are to be used pharmaceutically, they are 
combined with a pharmaceutically acceptable carrier. 

This invention further encompasses genes, either genomic or cDNA, 
which code for a polypeptide or protein in the cell of interest. The genes 
specifically hybridize under moderate or stringent conditions to a 
polynucleotide identified by SEQ ID NOS: 1-732 or their respective 
complements. The process of identification of larger fragment or the 
full-length coding sequence to which the partial sequence depicted in SEQ ID 
NOS: 1-732 hybridizes preferably involves the use of the methods and reagents 
provided in this invention, either singularly or in combination. 

Five methods are disclosed herein which allows one of skill in the art 
to isolate the gene or cDNA corresponding to the transcripts of the invention. 

TtAPF-PCTt Technique 

One method to isolate the gene or cDNA which code for a polypeptide 
or protein and which corresponds to a transcript of this invention, involves the 
5'-RACE-PCR technique. In this technique, the poly-A mRNA that contains 
the coding sequence of particular interest is first identified by hybridization to 



98/53319 PCT/US98/10277 

21 

a sequence disclosed herein and then reverse transcribed with a 3'-primer 
comprising the sequence disclosed herein. The newly synthesized cDNA strand 
is then tagged with an anchor primer of a known sequence, which preferably 
contains a convenient cloning restriction site attached at the 5'end. The tagged 
cDNA is then amplified with the 3'-primer (or a nested primer sharing sequence 
homology to the internal sequences of the coding region) and the 5'-anchor 
primer. The amplification may be conducted under conditions of various levels 
of stringency to optimize the amplification specificity. 5'-RACE-PCR can be 
readily performed using commercial kits (available from, e.g., BRL Life 
Technologies Inc. Clotech) according to the manufacturer's instructions. 

Trl r ntifi ffl tiffli nf" nwn P ftnes or EST * 

In addition, databases exist that reduce the complexity of ESTs by 
assembling contiguous EST sequences into tentative genes. For example, 
TIGR has assembled human ESTs into a datable called THC for tentative 
human consensus sequences. The THC database allows for a more definitive 
assignment compared to ESTs alone. Software programs exist (give examples) 
that allow for assembling ESTs into contiguous sequences from any organism 

T ,i l.li n n n frTt T Ai torn - hv nmhin, with thr S ftfrF trmwmm ortae 
Alternatively, niRNAfrom a sample preparation was used to construct 
cDNA library in the ZAP Express vector following the procedure described in 
• Velculescu et al. (1997) Science 270:484. The ZAP Express cDNA synthesis 
kit (Stratagene) was used accordingly to the manufacturer's protocol. Plates 
containing 250 to 2000 plaques are hybridized as described in Rupert et al. 
(1988) Mol. Cell. Bio. 8:3104 to oligonucleotide probes with the same 
conditions previously described for standard probes exxcept that the 
hybridization temperature is reduced to room temperature. Washes are 
performed in 6X standard-saline-chrate 0.1% SDS for 30 minutes at room 
temperature. The probes are labeled with 32P-ATP through use of T4 
polynucletoide kinase. 
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Tsniatinn of partial cDNA ft' fragment bv 3' directed PCR reaction 

This procedure is a modification of the protocol described in Polyak et 
al. (1997) Nature 389:300. Briefly, the procedure uses SAGE tags in PCR 
reaction such that the resultant PCR product contains the SAGE tag of interest 
as well as additional cDNA, the length of which is defined by the position of 
the tag with respect to the 3* end of the cDNA, The cDNA product derived 
from such a transcript driven PCR reaction can be used for many applications. 

RNA from a source believed to express the cDNA corresponding to a 
given tag is first converted to double-stranded cDNA using any standard 
cDNA protocol. Similar conditions used to generate cDNA for SAGE library 
construction can be employed except that a modified oligo-dT primer is used 
to dreive the first strand synthesis. For example, the oligonucleotide of 
compositon S'-B-TCC GGC GCG CCG TIT T CC CAG TCA CGA(30)-3\ 
contains a poly-T stretch at the 3' end for hybridization and priming from 
poly-A tails, an M13 priming site for use in subsequent PCR steps, a 5' Biotin 
label (B) for capture to strepavidin-coated magnetic beads, and an AscI 
restriction endonuclease site for releasing the cDNA from the 
streptavidin-coated magnetic beads. Theoretically, any sufficiently-sized DNA 
region capable of hybridizing to a PCR primer can be used as well as any other 
8 base pair recognizing endonuclease. 

cDNA constructed utilizing this or similar modified oligo-dT primer is 
then processed exacdy as described in U.S. Patent No. (insert) up until adapter 
ligation where only one adapter is ligated to the cDNA pool. After adapter 
ligation, the cDNA is released from the streptavidin-coated magnetic beads and 
is then used as a template for cDNA amplification. 

Various PCR protocols can be employed using PCR priming sites 
within the 3 f modified oligo-dT primer and the SAGE tag. The SAGE 
tag-derived PCR primer employed can be of varying length dictated by 5 1 
extension of the tag into the adaptor sequence. cDNA products are now 
available for a variety of applications. 
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This technique can be further modified by: (1) altering the length and/or 
content of the modified oligo-dT primer, (2) Iigating adaptors other than that 
previously employed within the SAGE protocol; (3) performing PCR from 
template retained on the streptavidin-coated magnetic beads; and (4) priming 
first strand cDNA synthesis with non-oligo-dT based primers. 

Isolation of cPNA using GCTeTrapper or modified OeneTrapper Technology 
The reagents and manufacturer's instructions for this technology are 
commercially available from Life Technologies, Inc., Gaithersburg, Maryland. 
Briefly, a complex population of single-stranded phagemid DNA containing 
directional cDNA inserts is enriched for the target sequence by hybridization 
in solution to a biotinylated oligonucleotide probe complementary to the target 
sequence. The hybrids are captured on streptavidin-coated paramagnetic 
beads. A magnet retrieves the paramagnetic beads from the solution, leaving 
nonhybridized single-stranded DNAs behind. Subsequently, the captured 
single-stranded DNA target is released from the biotinylated oligonucleotide. 
After release, the cDNA clone is further enriched by using a nonbiotinylated 
target oligonucleotide to specifically prime conversion of the single-stranded 
target to double-stranded DNA Following transformation and plating, 
typically 20% to 100% of the colonies represent the cDNA clone of interest. 
To identify the desired cDNA clone, the colonies may be screened by colony 
hybridization using the 32P-labeled oligonucleotide as described above for 
solution hybridization, or alternatively by DNA sequencing and alignment of 
all sequences obtained from numerous clones to determine a consensus 
sequence. 

The genes which are identified herein as being differentially expressed 
in normal and cancer cells can be used diagnostically and prognostically. 
Transcription levels in a test sample suspected of being neoplastic can be 
determined and compared to the levels in normal colon cells. The test sample 
may be from any tissue suspected of neoplasia, and particularly from either 
suspected colorectal or suspected pancreatic cancer cells. The control cells for 
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the purposes of comparison are normal cells, preferably of the same tissue type 
as the test sample, e.g., colon cells, or pancreatic duct epithelial cells. 
Upregulation of transcription or downregulation of transcription is therefore 
diagnostic of the neoplastic state, depending on what gene is used as a test 
reagent. Similarly, transcription levels can be monitored to assess patent 
responses to anti-tumor therapies. Transcription levels will also provide 
prognostic informatioa For example, the level of transcription in a test sample 
can be compared to levels found in bona fide normal and tumor cells. More 
extreme deviations from normal expression levels indicate a poorer prognosis. 

Transcription levels can be determined according to any means known 
in the art. These include, without limitation, Northern blots, nuclear run-on 
assays, in vitro transcription assays, primer extension assays, quantitative 
reverse transcriptase-polymerase chain reactions (RT-PCR), and hybrid filter 
binding assays. These techniques are well known in the art. See J.C. Alwine, 
D.J. Kemp, G.R. Stark, Proc. Natl Acad Set USA. 74, 5350 (1977); K. 
Zinn, D. Di-Maio, T. Maniatis, Cell 34, 865 (1983); G. Veres, R. A. Gibbbs, 
S.E. Scherer, C.T. Caskey, Science 237, 415 (1987). 

Similarly, upregulated genes and downregulated genes can be detected 
by measuring expression of their protein products. This can be done by any 
means known in the art, including but not limited to Western (immuno) blot, 
enzyme linked immunoadsorbent assay, radioimmunoassay, and enzyme assay. 
Such techniques are well known in the art. Protein products can be detected 
in tissue samples of a test patient, using a suspect sample as a test sample, and 
a matched normal tissue sample from the same tissue type as a control. If 
normal tissue is not available then a closely related tissue type can be used. 
Desirably both the samples being compared will be from the same individual. 
Alternatively, aberrant expression levels of protein products can be detected in 
body samples, such as blood, serum, feces, urine, sputum. As a control, a 
normal matched sample can be used from a healthy individual. Aberrant 
expression levels of transcripts can also be detected in such body samples, 
particularly in blood and serum. 
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Probes for use in the assays for transcription levels of particular genes 
or sets of genes may be RNA or DNA. The probes will be isolated 
substantially free of other cellular RN As or DNAs. If the reagent contains one 
probe then it will comprise at least 50% of the nucleic acids in the reagent 
composition. If the reagent contains more than one probe, then the proportion 
will decrease accordingly, so that specific probes will still comprise at least 
50% of the nucleic acids in the reagent composition. 

Probes can be labeled according to any means known in the art. These 
may include radioactive labels, fluorescent labels, enzymatic labels, and binding 
partner labels such as biotin. Means for labeling and detecting probes are well 
known in the art. Probes comprise at least 10, 1 1, 12, 15, 20, or 30 contiguous 
nucleotides of a selected gene. 

This invention provides proteins or polypeptides expressed from the 
polynucleotides of this invention, which is intended to include wild-type and 
recombinantly produced polypeptides and proteins from procaryotic and 
eucaryotic host cells, as well as muteins, analogs and fragments thereof. In 
some embodiments, the term also includes antibodies and anti-idiotypic 
antibodies. 

It is understood that functional equivalents or variants of the wild-type 
polypeptide or protein also are within the scope of this invention, for example, 
those having conservative amino acid substitutions. Other analogs include 
fusion proteins comprising a protein or polypeptide. 

The proteins and polypeptides of this invention are obtainable by a 
number of processes well known to those of skill in the art, which include 
purification, chemical synthesis and recombinant methods. Full length proteins 
can be purified from a colon or pancreatic cell or tissue lysate by methods such 
as immunoprecipitation with antibody, and standard techniques such as gel 
filtration, ion-exchange, reversed-phase, and affinity chromatography using a 
fusion protein as shown herein. For such methodology, see for example 
Deutscher et al. (1999) Guide To Protein Purification: Methods In 
Enzymology (Vol. 182, Academic Press). Accordingly, this invention also 
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provides the processes for obtaining these proteins and polypeptides as well as 
the products obtainable and obtained by these processes. 

The proteins and polypeptides also can be obtained by chemical 
synthesis using a commercially available automated peptide synthesizer such 
as those nuuiufactured by Peridn Elmer/Applied Biosystems, Inc., Model 430A 
or 431 A, Foster City. The synthesized protein or polypeptide can be 
precipitated and further purified, for example by high performance liquid 
chromatography (HPLC). Accordingly, this invention also provides a process 
for chemically synthesizing the proteins of this invention by providing the 
sequence of the protein and reagents, such as amino acids and enzymes and 
linking together the amino acids in the proper orientation and linear sequence. 

Alternatively, the proteins and polypeptides can be obtained by 
well-known recombinant methods as described, for example, in Sambrook et 
aL, (1989), supra, using the host cell and vector systems described above. 

Also provided by this application are the polypeptides and proteins 
described herein conjugated to a detectable agent for use in the diagnostic 
methods. For example, detectably labeled proteins and polypeptides can be 
bound to a column and used for the detection and purification of antibodies. 
They also are useful as immunogens for the production of antibodies as 
described below. The proteins and fragments of this invention are useful in an 
in vitro assay system to screen for agents or drugs, which modulate cellular 
processes. 

The proteins of this invention also can be combined with various liquid 
phase carriers, such as sterile or aqueous solutions, pharmaceutically 
acceptable carriers, suspensions and emulsions. Examples of non-aqueous 
solvents include propyl ethylene glycol, polyethylene glycol and vegetable oils. 
When used to prepare antibodies, the carriers also can include an adjuvant that 
is useful to non-specifically augment a specific immune response. A skilled 
artisan can easily determine whether an adjuvant is required and select one. 
However, for the purpose of illustration only, suitable adjuvants include, but 
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are not limited to Freund's Complete and Incomplete, mineral salts and 
polynucleotides. 

This invention also provides a pharmaceutical composition comprising 
any of a protein, analog, mutein, polypeptide fragment, antibody, antibody 
fragment or anti-idiotipic antibody of this invention, alone or in combination 
with each other or other agents, and an acceptable carrier. These compositions 
are useful for various diagnostic and therapeutic methods. 

Antibodies can be generated using the proteins encoded by the 
transcripts identified by the tags disclosed herein. Use of all or portions of the 
protein as immunogens is routine in the art. Similarly, fusion proteins can be 
used as immunogens. Antibodies can be affinity purified using the proteins or 
portions thereof used as immunogens. Similarly, monoclonal antibodies 
specifically immunoreacuve with the protein sequences of the invention can be 
generated according to techniques which are well known in the art. 

Antibodies can be used analytically to quantitate the expression of 
particular transcripts identified herein as upregulated or downregulated in 
cancer. In addition, antibodies can be conjugated or non-covalently linked to 
cytotoxic agents, such as cytotoxins, radionuclides, chemotherapeutic drugs, 
etc. Such antibodies can be used therapeutically to specifically target cancer 
cells in which the protein antigens are upregulated. These include the proteins 
encoded by the transcripts identified by the tags shown in Tables 2, 4, and 5. 
Means of making such linked cytotoxic antibodies and of administering the 
same are well known in the art. 

Also provided by this invention is an antibody capable of specifically 
forming a complex with the proteins or polypeptides as described above. The 
term "antibody" includes polyclonal antibodies and monoclonal antibodies. 
The antibodies include, but are not limited to mouse, rat, and rabbit or human 
antibodies. 

Laboratory methods for producing polyclonal antibodies and 
monoclonal antibodies, as well as deducing their corresponding nucleic acid 
sequences, are known in the art, see Harlow and Lane (1988) supra and 
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Sambrook et al. (1989) supra. The monoclonal antibodies of this invention can 
be biologically produced by introducing protein or a fragment thereof into an 
animal, e.g., a mouse or a rabbit. The antibody producing cells in the animal 
are isolated and fused with myeloma cells or heteromyeloma cells to produce 
hybrid cells or hybridomas. Accordingly, the hybridoma cells producing the 
monoclonal antibodies of this invention also are provided. 

Thus, using the protein or fragment thereof and well known methods, 
one of skill in the art can produce and screen the hybridoma cells and 
antibodies of this invention for antibodies having the ability to bind the proteins 
or polypeptides. 

If a monoclonal antibody being tested binds with the protein or 
polypeptide, then the antibody being tested and the antibodies provided by the 
hybridomas of this invention are equivalent. It also is possible to determine 
without undue experimentation, whether an antibody has the same specificity 
as the monoclonal antibody of this invention by determining whether the 
antibody being tested prevents a monoclonal antibody of this invention from 
binding the protein or polypeptide with which the monoclonal antibody is 
normally reactive. If the antibody being tested competes with the monoclonal 
antibody of the invention as shown by a decrease in binding by the monoclonal 
antibody of this invention, then it is likely that the two antibodies bind to the 
same or a closely related epitope. Alternatively, one can pre-incubate the 
monoclonal antibody of this invention with a protein with which it is normally 
reactive, and determine if the monoclonal antibody being tested is inhibited in 
its ability to bind the antigen. If the monoclonal antibody being tested is 
inhibited then, in all likelihood, it has the same, or a closely related, epitopic 
specificity as the monoclonal antibody of this invention. 

The term "antibody" also is intended to include antibodies of all 
isotypes. Particular isotypes of a monoclonal antibody can be prepared either 
directly by selecting from the initial fusion, or prepared secondarily, from a 
parental hybridoma secreting a monoclonal antibody of different isotype by 
using the sib selection technique to isolate class switch variants using the 
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procedure described in Steplewski et al. (1985) Proc. Natl. Acad. Sci. 82:8653 
or Spira et al. (1984) J. Immunol. Methods 74:307. 

This invention also provides biological active fragments of the 
polyclonal and monoclonal antibodies described above. These "antibody 
fragments" retain some ability to selectively bind with its antigen or 
immunogen. Such antibody fragments can include, but are not limited to: 

(1) Fab, 

(2) Fab 1 , 

(3) F(ab')2, 

(4) Fv, and 

(5) SCA 

A specific example of "a biologically active antibody fragment" is a 
CDR region of the antibody. Methods of making these fragments are known 
in the art, see for example, Harlow and Lane, (1988) supra. 

The antibodies of this invention also can be modified to create chimeric 
antibodies and humanized antibodies (Oi, et al. (1986) BioTechniques 
4(3):214). Chimeric antibodies are those in which the various domains of the 
antibodies' heavy and light chains are coded for by DNA from more than one 
species. 

The isolation of other hybridomas secreting monoclonal antibodies with 
the specificity of the monoclonal antibodies of the invention can also be 
accomplished by one of ordinary skill in the art by producing anti-idiotypic 
antibodies (Heriyn, et al. (1986) Science 232:100). An anti-idiotypic antibody 
is an antibody which recognizes unique determinants present on the 
monoclonal antibody produced by the hybridoma of interest. 

Idiotypic identity between monoclonal antibodies of two hybridomas 
demonstrates that the two monoclonal antibodies are the same with respect to 
their recognition of the same epitopic determinant. Thus, by using antibodies 
to the epitopic determinants on a monoclonal antibody it is possible to identify 
other hybridomas expressing monoclonal antibodies of the same epitopic 
specificity. 
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It is also possible to use the anti-idiotype technology to produce 
monoclonal antibodies which mimic an epitope. For example, an anti-idiotypic 
monoclonal antibody made to a first monoclonal antibody will have a binding 
domain in the hypervariable region which is the mirror image of the epitope 
bound by the first monoclonal antibody. Thus, in this instance, the 
anti-idiotypic monoclonal antibody could be used for immunization for 
production of these antibodies. 

As used in this invention, the term "epitope" is meant to include any 
determinant having specific affinity for the monoclonal antibodies of the 
invention. Epitopic determinants usually consist of chemically active surfece 
groupings of molecules such as amino acids or sugar side chains and usually 
have specific three dimensional structural characteristics, as well as specific 
charge characteristics. 

The antibodies of this invention can be linked to a detectable agent or 
label There are many different labels and methods of labeling known to those 
of ordinary skill in the art. 

The antibody-label complex is useful to detect the protein or fragments 
in a sample, using standard immunochemical techniques such as 
immunohistochemistry as described by Harlow and Lane (1988) supra. 
Competitive and non-competitive immunoassays in either a direct or indirect 
format are examples of such assays, e.g., enzyme linked immunoassay (ELISA) 
radioimmunoassay (RIA) and the sandwich (immunometric) assay. Those of 
skill in the art will know, or can readily discern, other immunoassay formats 
without undue experimentation. 

The coupling of antibodies to low molecular weight haptens can 
increase the sensitivity of the assay. The haptens can then be specifically 
detected by means of a second reaction. For example, it is common to use 
haptens such as biotin, which reacts avidin, or dinitrophenyl, pyridoxal, and 
fluorescein, which can react with specific anti-hapten antibodies. See Harlow 
and Lane (1988) supra. 
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The monoclonal antibodies of the invention also can be bound to many 
different carriers. Thus, this invention also provides compositions containing 
the antibodies and another substance, active or inert. Examples of well-known 
carriers include glass, polystyrene, polypropylene, polyethylene, dextran, nylon, 
amylases, natural and modified celluloses, polyacrylamides, agaroses and 
magnetite. The nature of the carrier can be either soluble or insoluble for 
purposes of the invention. Those skilled in the art will know of other suitable 
carriers for binding monoclonal antibodies, or will be able to ascertain such, 
using routine experimentation. 

Compositions containing the antibodies, fragments thereof or cell lines 
which produce the antibodies, are encompassed by this invention. When these 
compositions are to be used pharmaceutically, they are combined with a 
pharmaceutical^ acceptable carrier. 

The present invention also provides a screen for various agents which 
modulate the expression of a gene in a pancreatic or colon cell. To practice the 
method in vitro, suitable cell cultures or tissue cultures are first provided. The 
cell can be a cultured cell or a genetically modified cell in which a trancript 
from SEQ ID NOS: 1-732, or their complements, is expressed. Alternatively, 
the cells can be from a tissue biopsy. The cells are cultured under conditions 
(temperature, growth or culture medium and gas (CO^) and for an appropriate 
amount of time to attain exponential proliferation without density dependent 
constraints. It also is desirable to maintain an additional separate cell culture; 
one which does not receive the agent being tested as a control. 

As is apparent to one of skill in the art, suitable cells may be cultured 
in microtiter plates and several agents may be assayed at the same time by 
noting genotypic changes, phenotypic changes or cell death. 

When the agent is a composition other than a DNA or KNA, the agent 
may be directly added to the cell culture or added to culture medium for 
addition. As is apparent to those skilled in the art, an "effective" amount must 
be added which can be empirically determined. When the agent is a 
polynucleotide, it may be directly added by use of a gene gun or 
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electroporatioa Alternatively, it may be inserted into the cell using a gene 
delivery vehicle or vector as described above. 

An agent is a potential therapeutic if it alters the expression of gene in 
the cell Altered expression can be detected by assaying for altered mRNA 
expression or protein expression using the probes, primers and antibodies as 
described herein. 

For the purposes of this invention, an "agent" is intended to include, but 
not be limited to a biological or chemical compound such as a simple or 
complex organic or inorganic molecule, a peptide, a protein (e.g. antibody) or 
a polynucleotide (e.g. anti-sense). A vast array of compounds can be 
synthesized, for example polymers, such as polypeptides and polynucleotides, 
and synthetic organic compounds based on various core structures, and these 
are also included in the term "agent". In addition, various natural sources can 
provide compounds for screening, such as plant or animal extracts, and the 
like. It should be understood, although not always explicitly stated that the 
agent is used alone or in combination with another agent, having the same or 
different biological activity as the agents identified by the inventive screen. The 
agents and methods also are intended to be combined with other therapies. 

The above disclosure generally describes the present invention. A more 
complete understanding can be obtained by reference to the following specific 
examples which are provided herein for purposes of illustration only, and are 
not intended to limit the scope of the invention. 

EXAMPLE 1 

This example demonstrates the characterization of the general 
transcription of human colorectal epithelium, colorectal cancers, and pancreatic 
cancers. 

We used the recently developed SAGE (serial analysis of gene 
expression) method to identify and quantify a total of 303,706 transcripts 
derived from human colorectal (CR) epithelium, CR cancers or pancreatic 
cancers (Table 1A ) (3). These transcripts represented approximately 48,741 
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different genes (0 that ranged in average expression from 1 copy per cell to as 
many as 5,300 copies per cell (i). The number of different transcripts observed 
in each cell population varied from 14,247 to 20,471. The bulk of the mRNA 
mass (75%) consisted of transcripts expressed at more than five copies per cell 
on average (Table IB). In contrast, the majority (86%) of transcripts were 
expressed at less than 5 copies per cell, but in aggregate this low abundance 
class represented only 25% of the mRNA mass. This distribution was 
consistently observed among the different samples analyzed and was consistent 
with previous studies of RNA abundance classes based on RNA-DNA 
reassociation kinetics (Rot curves) . Monte Carlo simulations revealed that our 
analyses had a 92% probability of detecting a transcript expressed at an 
average of three copies per cell (7). 
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Many of the SAGE tags appeared to represent previously undescribed 
transcripts, as only 54% of the tags matched entries in GenBank (Table 1). 
Twenty percent of these matching transcripts corresponded to characterized 
mRNA sequence entries in GenBank, whereas 80% matched uncharacterized 
EST entries. As expected, the likelihood of a tag being present in the 
databases was related to abundance; GenBank matches were identified for 98% 
of the transcripts expressed at more than 500 copies per cell but for only 51% 
of the transcripts expressed at < 5 copies per cell. Because the SAGE data 
provide a quantitative assay of transcript abundance, unaffected by differences 
in cloning or PCR efficiency, these data provide an independent and relatively 
unbiased estimate of the current completeness of publicly available EST 
databases. 

TvXAMPLE 2 

This example demonstrates a comparison of the expression pattern of 
normal colon epithelium and primary colon cancers. 

Comparison of expression patterns between normal colon epithelium 
and primary colon cancers revealed that the majority of transcripts were 
expressed at similar levels (Fig. 1A). However, the expression profiles also 
revealed 289 transcripts that were expressed at significantly different levels [P 
< 0.01, (8)]. Of these 289, 181 were decreased in colon tumors compared to 
normal colon (average decrease 10-fold; Fig. IB; examples in Fig. 2A). 
Conversely, 108 transcripts were expressed at higher levels in the colon 
cancers than in normal colon (average increase 13-fold; Fig. 1C; examples in 
Fig. 2A). Monte Carlo simulations indicated that the analysis would have 
detected over 95% of those transcripts expressed at a 6-fold or greater level 
in normal vs. tumor cells or vice versa (9). Because relatively stringent criteria 
were used for defining differences [P < 0.01, (8)1 the number of differences 
reported above is likely to be an underestimate. 
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EXAMPLE 3 

This example demonstrates the similarities and differences between 
cancer cell line transcription and transcription of primary cancer tissues. 
To determine how many of the 289 differences were independent of the cellular 
microenvironment of cancers in vivo, SAGE data from CR cancer cell lines was 
compared to that from primary CR cancer tissues (Fig. IB, 1C). Perhaps 
surprisingly, the majority of transcripts (130 of 181) that were expressed at 
reduced levels in cancer cells in vivo were also expressed at significantly lower 
levels in the cell lines (Fig. IB). Likewise, a significant fraction of the 
transcripts expressed at increased levels in primary cancers were also expressed 
at higher levels in the CR cancer ceU lines (Fig. 1C). Thus, many of the gene 
expression differences that distinguish normal from tumor cells in vivo persist 
during in vitro growth. However, despite these similarities there were also 
many differences. For example, only 47 of 228 genes expressed at higher 
levels in CR cancer cell lines were also expressed at high levels in the primary 
CR cancers. 

In combination, comparing the expression pattern of CR cancer cells 
(in vivo or in vitro) to normal colon revealed 548 differentially expressed 
transcripts (Fig. 1B.C, Tables 2 and 3). The average difference in expression 
for these transcripts was 15 fold. Although the ability to detect differences is 
influenced by the magnitude of the variance with the power to detect smaller 
differences being less, 92 transcripts that were less than three fold different 
were identified among the 548 transcripts. However, those genes exhibiting 
the greatest differences in expression are likely to be the most biologically 
important. 
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EXAMPLE 4 

This example demonstrates the similarities and differences between 
colorectal cancer transcription and pancreatic cancer transcription. 
To determine whether the changes noted in CR cancers were neoplasia or cell 
type specific, we performed SAGE on mRNA derived from pancreatic cancers. 
A total of 404 transcripts were expressed at higher levels in pancreatic cancers 
compared to normal colon epithelium (examples in Fig. 2B). The majority 
(268) of these transcripts were pancreas-specific (70) (Example in Fig, 2C) 
although 136 were also expressed at high levels in CR cancers. These 136 
transcripts constituted 47% of the 289 transcripts increased in CR cancers 
relative to normal colon and are likely to be related to the neoplastic process 
rather than to the specific cell type of origin. 

K3TAMFLE 5 

This example demonstrates the reproducibility of the transcription 
patterns observed among a larger number of cancer samples. 

One question that arose from these data is the potential heterogeneity 
of expression between individual tumors. The SAGE data were acquired from 
two ©camples of each tissue type (normal colon, primary CR cancer, CR cancer 
cell line, etc.). To examine the generality of these expression profiles, we 
arbitrarily selected 27 differentially expressed transcripts and evaluated them 
in six to twelve samples of normal colon and primary cancers by Northern blot 
analysis (7/). In general, expression patterns were very reproducible among 
different samples. Of 10 genes with elevated expression in normal colon 
relative to CR cancers as determined by SAGE, each was detected in the 
normal colon samples and was expressed at considerably lower levels in tumors 
(examples in Fig. 2A). Similarly, most of the genes identified by SAGE as 
increased in CR or pancreatic cancers were confirmed to be reproducibly 
expressed in the majority of primary cancers examined by Northern blot 
(examples in Fig. 2A). It is important to note, however, that there were 
differences among the cancers, with a few cancers exhibiting particularly high 
or low levels of individual transcripts. Such differences in gene expression 
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undoubtedly contribute to the observed heterogeneity in biological properties 
of cancers derived from the same organ . 

EXAMPLE 6 

This example demonstrates the identities of some of the transcripts 
which were found to be differentially expressed in tumor and normal tissues. 
What are the identities of the differentially expressed genes? Of the 548 
differentially expressed transcripts, 337 were tentatively identified through 
database comparisons. When tested, the great majority (93%) of these 
identifications proved to be legitimate (73), as expected from previous SAGE 
analyses . Although a large number of differentially expressed genes were 
identified, some simple patterns did emerge. For example, genes that were 
expressed at higher levels in normal colon epithelium than in CR tumors were 
often differentiation-related. These genes included liver fatty acid binding 
protein , cytokeratin 20 , carbonic anhydrase , guanylin and uroguanylin , 
which are known to be important for the normal physiology or architecture of 
the colon epithelium (Table 2). On the other hand, genes that were increased 
in CR cancers were often related to the robust growth characteristics that these 
cells exhibit. For example, gene products associated with protein synthesis, 
including 48 ribosomal proteins, five elongation factors, and five genes 
involved in glycolysis were observed to be elevated in both CR and pancreatic 
cancers compared to normal colon cells. Although the majority of the 
transcripts could not have been predicted to be differentially expressed in 
cancers, several have previously been shown to be dysregulated in neoplastic 
cells. The latter included IGFH , B23 nucleophosmin, the Pi form of 
glutathione S-transferase, and several ribosomal proteins which were all 
increased in cancer cells as previously reported. Likewise, Dra and gelsolin 
were both decreased in cancer as previously reported. Surprisingly, two widely 
studied oncogenes, c-fos and c-erbb$, were expressed at much higher levels in 
normal colon epithelium than CR cancers, in contrast to their up-regulation in 
transformed cells . 
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In summary, these data provide basic information necessary for 
understanding the gene expression differences that underlie cancer phenotypes. 
They additionally provide a necessary framework for interpreting the 
significance of individual differentially expressed genes. Although this study 
demonstrated that a large number of such differences exist (approximately 500 
at the depth of analysis employed), it was equally remarkable that the fraction 
of transcripts exhibiting significant differences was relatively small, 
representing 1.5 % of the transcripts detected in any given cell type (26). The 
fact that many, but not all, of the differences were preserved during in vitro 
culture demonstrates the utility of cultured lines for examination of some 
aspects of gene expression, but also provides a note of caution in relying on 
such lines to perfectly mimic tumors in their natural environment. Finally, the 
finding that hundreds of specific genes are expressed at different levels in CR 
cancers, and that some of these are also expressed differentially in pancreatic 
cancers, provides a wealth of new reagents for future biologic and diagnostic 
experimentation. 
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RF.FR'R'RNCP.S AND NOTES 

1. M. D. Adams, et aL, Nature 377, supp. 28, 3 (1995); M. 
Schena, D. Shalon, R. W. Davis, P. O. Brown, Science 270, 467 (1995); J. 
Derisi, etaL, Nature Genetics 14, 457 (1996); T. M. Gress, et aL, Oncogene 
13, 1819 (1996); D. J. Lockhart, et aL, Nature Biotechnology 14, 1675 
(1996); M. Schena, etal.,Proc Natl Acad Sd USA 93, 10614 (1996). 

2. V. E. Velculescu, L. Zhang, B. Vogelstein, K. W. Kinder, 
Science 270, 484 (1995); V. E. Velculescu, etal., Ce//88, 243 (1997). 

3. To minimize individual variation, approximately equal numbers 
of tags (30,000) were derived from two different patients for each tissue. For 
primary tumors (two CR carcinomas and two pancreatic adenocarcinomas), 
RNA was isolated from portions of tumors judged to contain 60%-90% tumor 
cells by histopathology. The cells grown in vitro were derived from CR 
(SW837, Caco2) and pancreatic (ASPC-1, PL45) cancer cell lines. CR 
epithelial cells were isolated from sections of normal colon mucosa from two 
patients using EDTA as previously described [ S. Nakamura, I. Kino, S. Baba, 
Gut 34, 1240 (1993)]. Histopathology confirmed that the isolated cells were 
greater than 90% epithelial. Isolation of Poly-A RNA and SAGE was 
performed as previously described (2). SAGE data was analyzed by means of 
SAGE software and GenBank Release 95 as previously described (2). 

4. A total of 69,393 different SAGE tags were identified among 
the 303,706 tags analyzed. A small fraction of these different tags were likely 
due to sequencing errors. SAGE analysis of yeast (2), wherein the entire 
genomic sequence is known, demonstrated a sequencing error rate of ~ 0.7%, 
translating to a SAGE tag error rate of 6.8% (1 - 0.993 10 ). Because these 
sequencing mistakes are essentially random, they do not substantially affect the 
analysis although they could artificially inflate the number of unique genes 
identified. Therefore, to be conservative, we reduced our estimate of unique 
genes identified by this maximum tag error rate (e.g., 6.8% of 303,706 total 
tags). The number of different tags derived from the same gene due to 
alternative splicing was assumed to be negligible. 
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5 . Abundances can be simply determined by dividing the observed 
number of tags for a given transcript by the total number of tags obtained. An 
estimate of approximately 300,000 transcripts per cell was used to convert the 
abundances to copies per cell [N. D. Hastie, J. O. Bishop, Cell 9, 761 (1976)]. 

6. J. O. Bishop, J. G. Morton, M. Rosbash, M Richardson, Nature 
250, 199 (1974); B. Lewin, Gene Expression Vol 2 (John Wiley and sons, 
New York 1980). 

7. Computer simulations indicated that analysis of 300,000 tags 
would yield a 92 % chance of detecting a tag for a transcript whose expression 
was at least three copies per cell on average among the tissues examined and 
assuming 300,000 transcripts per cell. 

8. To minimize the number of assumptions and to account for the 
large number of comparisons being made, Monte Carlo analysis was used for 
determining statistical significance. The null hypothesis was that the level, 
kind, and distribution of transcripts were the same for cancer and normal cells. 
For each transcript, 100,000 simulations were performed to determine the 
relative likelihood due to chance alone ("p-chance") of obtaining a difference 
in expression equal to or greater than the observed difference, given the null 
hypothesis. This likelihood was converted to an absolute probability value by 
simulating 40 experiments in which a representative number of transcripts 
(27,993 transcripts in each experiment) was identified and compared. The 
distribution of transcripts used for these simulations was derived from the 
average level of expression observed in the original samples. The distribution 
of the p-chance scores obtained in the 40 simulated experiments (false 
positives) was then compared to those obtained experimentally. Based on this 
comparison, a maximum value of 0.0005 was chosen for p-chance. This 
yielded a false positive rate that was no higher than 0.01 for the least 
significant p-chance value below the cutoff. 

9. Two hundred simulations assuming an abundance of 0.0001 in 
one sample and 0.0006 in a second sample revealed a significant difference (P 
< 0.01, [8]) 95% of the time. 
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10. It is not possible to obtain pancreatic ductal epithelium, from 
which pancreatic carcinomas arise, in sufficient quantities to perform SAGE. 
It is therefore not possible to determine whether these transcripts were derived 
from genes that were highly expressed only in pancreatic cancers or were also 
expressed in pancreatic duct cells. 

1 1 . Total KNA isolation and Northern blot analysis was performed 
as described [W. S. el-Deiry, etaL, CeUlS, 817 (1993)]. 

12. A. H. Owens, D. S. Coffey, S. B. Baylin, Eds., Tumor Cell 
Heterogeneity: Origins and Implications (Academic Press, New York, 1982). 

13. Northern blot analyses were done on 45 of the 337 differentially 
expressed transcripts with tentative database matches. In three cases, the 
pattern of expression was not differentially expressed as predicted by SAGE 
and, for the purposes of this calculation, were presumed to represent incorrect 

database matches. 

14. D. C. Rubin, D. E. Ong, J. I. Gordon, Proc Natl Acad Sci U S 
A 86, 1278 (1989); K. Okubo, J. Yoshii, H. Yokouchi, M. Kameyama, K. 
Matsubara, DNA Res 1, 37 (1994). 

15. R. Moll, et al, Differentiation 53, 75 (1993). 

16. J. Sowden, S. Leigh, L Talbot, J. Delhanty, Y. Edwards, 
Differentiation 53, 67 (1993). 

17. F. J. de Sauvage, et al, Proc Natl Acad Sci USAS9, 9089 

(1992). 

18. R C. Wiegand, et al., FEBSLett 311, 150 (1992). 

19. J.V.Tricoli, etaL, Cancer Res 46, 6169 (1986); S.Lambert, 
J. Vivario, I. Boniver, R. Gol-Winkler, Int J Cancer 46, 405 (1990). 

20. W. Y. Chan, et al , Biochemistry 28, 1 033 (1989). 

21. J. D. Hayes, D. J. Pulford, Crit Rev Biochem Mol Biol 30, 445 

(1995). 

22. G. F. Barnard, etal, Cancer Res 52, 3067 (1992); P. J. Chiao, 
D. M. Shin, P. G. Sacks, W. K. Hong, M. A. Tainsky, Mol Carcinog 5, 219 
(1992); N. Kondoh, C. W. Schwrinfest, K. W. Henderson, T. S. Papas, 
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Cancer Res 52, 791 (1992); G. F. Barnard, et aL, Cancer Res 53, 4048 
(1993); M. G. Denis, etal., IntJCancer 55, 275 (1993); J. M. Frigerio, et 
al, Hum Mol Genet 4, 37 (1995). 

23. C. W. Schweinfest, K. W. Henderson, S. Suster, N. Kondoh, 
T. S. Papas, Proc Natl Acad Sci USA 90, 4166 (1993). 

24. M. Tanaka, etaL, Cancer Res 55, 3228 (1995); D. Medina, F. 
S. Kittrell, C. J. Oborn, M. Schwartz, Cancer Res Si, 668 (1993). 

25. A. D. Miller, T. Curran, I. M Verma, Cell 36, 51 (1984); M. 
H. Kraus, W. Issing, T. Miki, N. C. Popescu, S. A. Aaronson, Proc Natl Acad 
Sci USA 86, 9193 (1989). 

26. In the case of normal and neoplastic colon cancer tissue, 548 
differentially transcripts were identified among the 36,125 unique transcripts. 

27. All references cited are hereby incorporated by reference herein. 

28. Sequences tags in Tables 2-4 are consecutively numbered to 
formSEQIDNOS: 1-732. 
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CLAIMS 

1. A method of diagnosing colon cancer in a sample suspected of being 
neoplastic, comprising the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a colonic tissue 
suspected of being neoplastic and the second sample is of a normal human 
colonic tissue, and wherein the transcript is identified by a tag selected from the 
group consisting of those shown in Table 3; 

identifying the first sample as neoplastic when the level of the 
at least one transcript is found to belower in the first sample than in the second 
sample. 

2. A method of diagnosing colon cancer in a sample suspected of being 
neoplastic, comprising the steps of: 

comparing the level of at least one transcript in a first sample of 
15 a tissue to a second sample, wherein the first sample is of a colonic tissue 

suspected of being neoplastic and the second sample is of a normal human 
colonic tissue, and wherein the transcript is identified by a tag selected from the 
group consisting of those shown in Table 2; 

identifying the first sample as neoplastic when the level of the 
20 at least one transcript is found to be higher in the first sample than in the 

second sample. 

3. The method of claim 1 wherein a comparison of at least two of said 
transcripts is performed. 
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4. The method of claim 2 wherein a comparison of at least two of said 
transcripts is performed. 
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5. The method of claim 1 wherein a comparison of at least five of said 
transcripts is performed. 

6. The method of claim 2 wherein a comparison of at least five of said 
transcripts is performed. 

7. The method of claim 1 wherein a comparison of at least ten of said 
transcripts is performed. 

8. The method of claim 2 wherein a comparison of at least ten of said 
transcripts is performed. 

9. The method of claim 1 wherein a comparison of at least twenty of said 
transcripts is performed. 

10. The method of claim 2 wherein a comparison of at least twenty of said 
transcripts is performed. 

1 1 . The method of claim 1 wherein a comparison of at least thirty of said 
transcripts is performed. 

12. The method of claim 2 wherein a comparison of at least thirty of said 
transcripts is performed. 

13 . An isolated and purified human nucleic add molecule which comprises 
a SAGE tag selected from SEQ ID NO: 1-732. 



14. The nucleic acid molecule of claim 13 which is a cDNA molecule. 
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15. The nucleic acid molecule of claim 13 wherein the SAGE tag is located 
at the 3' end of the molecule, adjacent to the 3'-most Nlam restriction enzyme 
site. 

16. An isolated nucleotide probe comprising at least 10 nucleotides of a 
human nucleic acid molecule, wherein the human nucleic acid molecule 
comprises a SAGE teg selected from SEQ ID NO: 1-732. 

17. The probe of claim 16 which comprises the selected SAGE tag. 

18. A diagnostic reagent for evaluating neoplasia of a colorectal tissue, 
comprising at least 2 probes according to claim 16. 

19. The diagnostic reagent of claim 18 which comprises at least 5 probes 
according to claim 16. 

20. The diagnostic reagent of claim 18 which comprises at least 10 probes 
according to claim 16. 

21. The diagnostic reagent of claim 18 which comprises at least 20 probes 
according to claim 16. 

22. The diagnostic reagent of claim 18 which comprises at least 30 probes 
according to claim 16. 

23. A diagnostic reagent for evaluating neoplasia of a colorectal tissue, 
comprising at least 2 probes according to claim 17. 



24. A method of diagnosing pancreatic cancer in a sample suspected of 
being neoplastic, comprising the steps of: 
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comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a pancreatic tissue 
suspected of being neoplastic and the second sample is of a normal human 
colon tissue, wherein said transcript is identified by a tag selected from the 
group consisting of those shown Table 4; 

identifying the first sample as neoplastic when the level of the 
at least one transcript is found to be higher in the first sample than in the 
second sample. 

25. A method of diagnosing cancer in a sample suspected of being 
neoplastic, comprising the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a tissue suspected of 
being neoplastic and the second sample is of a normal human tissue of the 
same tissue type, wherein said transcript is identified by a tag selected from the 
group consisting of those shown Table 5; 

identifying the first sample as neoplastic when the level of the 
at least one transcript is found to be higher in the first sample than in the 
second sample. 

26. A method to aid in the determination of a prognosis for a colon cancer 
patient, comprising the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a neoplastic colonic 
tissue and the second sample is of a normal human colonic tissue, and wherein 
the transcript is identified by a tag selected from the group consisting of those 
shown in Table 3; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be lower in the first sample than in the second sample. 

27. A method to aid in determining a prognosis for a patient with colon 
cancer, comprising the steps of: 
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comparing the level of at least one transcript in a first tissue 
sample to a second sample, wherein the first sample is of a colonic cancer 
tissue and the second sample is of a normal human colonic tissue, and wherein 
the transcript is identified by a teg selected from the group consisting of those 

shown in Table 2; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be higher in the first sample than in the second sample. 

28. A method of diagnosing colon cancer in a sample suspected of being 

neoplastic, comprising the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
colonic tissue suspected of being neoplastic and the second sample is of a 
normal human colonic tissue, and wherein the protein is encoded by a transcript 
identified by a tag selected from the group consisting of those shown in Table 
3; 

identifying the first sample as neoplastic when the level of 
expression of the protein is found to be lower in the first sample than in the 
second sample. 

29. A method of diagnosing colon cancer in a sample suspected of being 

neoplastic, comprising the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
colonic tissue suspected of being neoplastic and the second sample is of a 
normal human colonic tissue, and wherein the protein is encoded by a transcript 
identified by a tag selected from the group consisting of those shown in Table 
2; 

identifying the first sample as neoplastic when expression of the 
protein is found to be higher in the first sample than in the second sample. 
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30. A method to aid in determining a prognosis of a patient having 
pancreatic cancer, comprising the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a neoplastic 
pancreatic tissue and the second sample is of a normal human colon tissue, 
wherein said transcript is identified by a tag selected from the group consisting 
of those shown Table 4; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be higher in the first sample than in the second sample. 

31. A method to aid in providing a prognosis for a cancer patient, 

comprising the steps of: 

comparing the level of at least one transcript in a first sample of 
a tissue to a second sample, wherein the first sample is of a neoplastic tissue 
and the second sample is of a normal human tissue of the same tissue type, 
wherein said transcript is identified by a tag selected from the group consisting 
of those shown Table 5; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be higher in the first sample than in the second sample. 

32. A method of diagnosing pancreatic cancer in a sample suspected of 
being neoplastic, comprising the steps of: 

comparing the level of expression of at least one protein 
encoded by a transcript in a first sample of a tissue to a second sample, wherein 
the first sample is of a pancreatic tissue suspected of being neoplastic and the 
second sample is of a normal human colon tissue, wherein said protein is 
encoded by a transcript identified by a tag selected from the group consisting 
of those shown Table 4; 

identifying the first sample as neoplastic when expression of the 
protein is found to be higher in the first sample than in the second sample. 
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33. A method of diagnosing cancer in a sample suspected of being 
neoplastic, comprising the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
tissue suspected of being neoplastic and the second sample is of a normal 
human tissue, wherein said protein is encoded by a transcript identified by a tag 
selected from the group consisting of those shown Table 5; 

identifying the first sample as neoplastic when expression of the 
protein is found to be higher in the first sample than in the second sample. 

34. A method to aid in the determination of a prognosis for a colon cancer 
patient, comprising the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
neoplastic colonic tissue and the second sample is of a normal human colonic 
tissue, and wherein the protein is encoded by a transcript identified by a tag 
selected from the group consisting of those shown in Table 3; 

determining a poorer prognosis if the level of expression is 
found to be lower in the first sample than in the second sample. 

35. A method to aid in determining a prognosis for a patient with colon 
cancer, comprising the steps of: 

comparing the level of expression of at least one protein in a 
first tissue sample to a second sample, wherein the first sample is of a colonic 
cancer tissue and the second sample is of a normal human colonic tissue, and 
wherein the protein is encoded by a transcript identified by a tag selected from 
the group consisting of those shown in Table 2; 

determining a poorer prognosis if the level of expression is 
found to be higher in the first sample than in the second sample. 
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36. A method to aid in determining a prognosis of a patient having 
pancreatic cancer, comprising the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
neoplastic pancreatic tissue and the second sample is of a normal human colon 
tissue, wherein said protein is encoded by a transcript identified by a tag 
selected from the group consisting of those shown Table 4; 

determining a poorer prognosis if the level of expression is 
found to be higher in the first sample than in the second sample. 

37. A method to aid in providing a prognosis for a cancer patient, 

comprising the steps of: 

comparing the level of expression of at least one protein in a 
first sample of a tissue to a second sample, wherein the first sample is of a 
neoplastic tissue and the second sample is of a normal human tissue of the same 
tissue type, wherein said protein is encoded by a transcript identified by a tag 
selected from the group consisting of those shown Table 5; 

determining a poorer prognosis if the level of expression is 
found to be higher in the first sample than in the second sample. 

38. A method of treating a cancer cell, comprising the step of: 

administering to a cancer cell an antibody which specifically 
binds to a protein encoded by a transcript identified by a tag selected from the 
group consisting of those shown in Tables 2, 4, and 5, wherein the antibody is 
linked to a cytotoxic agent 

39. An antibody linked to a cytotoxic agent, wherein the antibody 
specifically binds to a protein encoded by a transcript identified by a tag 
selected from the group consisting of those shown in Tables 2, 4, and 5. 
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40. A method of detecting colon cancer in a patient, comprising the steps 
of: 

comparing the level of at least one protein in a first body sample 
to a second body sample, wherein the first sample is a body sample of the 
patient and the second sample is of a normal human, wherein the protein is 
encoded by a transcript identified by a tag selected from the group consisting 
of those shown in Table 2, wherein the first and second body sample is a 
sample selected from the group consisting of blood, urine, feces, sputum, and 
serum; 

identifying neoplasia when the level of the at least one protein 
is found to be higher in the first sample than in the second sample. 

41. A method of detecting pancreatic cancer in a patient, comprising the 
steps of: 

comparing the level of at least one protein encoded by a 
transcript in a first sample of a tissue to a second sample, wherein the first 
sample is of the patient and the second sample is of a normal human, wherein 
said protein is encoded by a transcript identified by a tag selected from the 
group consisting of those shown Table 4, wherein the first and second sample 
is a sample selected from the group consisting of blood, urine, feces, sputum, 
and serum; 

identifying neoplasia when the level of the at least one protein 
is found to be higher in the first sample than in the second sample. 

42. A method of detecting cancer in a patient, comprising the steps of: 

comparing the level of at least one protein in a first sample to 
a second sample, wherein the first sample is of patient and the second sample 
is of a normal human, wherein said protein is encoded by a transcript identified 
by a tag selected from the group consisting of those shown Table 5, wherein 
the first and second body sample is a sample selected from the group consisting 
of blood, urine, feces, sputum, and serum; 
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identifying neoplasia when the level of the at least one protein 
is found to be higher in the first sample than in the second sample. 

43. A method to aid in determining a prognosis for a patient with colon 
cancer, comprising the steps of: 

comparing the level of at least one protein in a first sample to 
a second sample, wherein the first sample is of a colonic cancer patient and the 
second sample is of a normal human, wherein the protein is encoded by a 
transcript identified by a tag selected from the group consisting of those shown 
in Table 2, wherein the first and second sample is a sample selected from the 
group consisting of blood, urine, feces, sputum, and serum; 

determining a poorer prognosis if the level of the at least one 
protein is found to be higher in the first sample than in the second sample. 

44. A method to aid in determining a prognosis of a patient having 
pancreatic cancer, comprising the steps of: 

comparing the level of at least one protein in a first sample to 
a second sample, wherein the first sample is of a pancreatic cancer patient and 
the second sample is of a normal human, wherein said protein is encoded by a 
transcript identified by a tag selected from the group consisting of those shown 
Table 4, wherein said first and second sample is a sample selected from the 
group consisting of blood, urine, feces, sputum, and serum; 

determining a poorer prognosis if the level of the at least one 
protein is found to be higher in the first sample than in the second sample. 

45. A method to aid in providing a prognosis for a cancer patient, 
comprising the steps of: 

comparing the level of expression of at least one protein in a 
first sample to a second sample, wherein the first sample is of a cancer patient 
and the second sample is of a normal human, wherein said protein is encoded 
by a transcript identified by a tag selected from the group consisting of those 
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shownTable 5, wherein the first and second sample is a sample selected from 
the group consisting of blood, urine, feces, sputum, and serum; 

deternrining a poorer prognosis if the level of the at least one 
protein is found to be higher in the first sample than in the second sample. 

46. Amethod of detecting colon cancer in a patient, comprising the steps 

comparing the level of at least one transcript m a first body 
sample toasecond body sample, wherein the first sample is a body sample of 
the patient and the second sample is of a normal human, wherein the transcript 
is identified by a tag selected from the group consisting of those shown m 
Table 2, wherein the first and secondbody sample is a sample selected from the 
group consisting of blood, urine, feces, sputum, and serum; 

identifying neoplasia when the level of the at least one transcript 
is found to be higher in the first sample than in the second sample. 

15 47. A method of detecting pancreatic cancer in a patient, comprising the 

steps of. 

comparing the level of at least one transcript in a first sample of 

second sampleisofanormal human, wherein said transcript is identifiedbya 
tag selected from the group consisting of those shown Table 4, wherein the 
first and second sample is a sample selected from the group consistmg of 
blood, urine, feces, sputum, and serum; 

identifying neoplasia W hen the level of the at least one transcript 

is found to be higher in the first sample than in the second sample. 

25 48 A method of detecting cancer in a patient, comprising the steps of: 

comparing the level of at least one transcript in a first sample to 

is of a normal human, wherein said transcript is identified by a tag selected 
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from the group consisting of those shown Table 5, wherein the first and second 
body sample is a sample selected from the group consisting of blood, urine, 
feces, sputum, and serum; 

identifying neoplasia when the level of the at least one transcript 
is found to be higher in the first sample than in the second sample. 
49. A method to aid in determining a prognosis for a patient with colon 
cancer, comprising the steps of: 

comparing the level of at least one transcript in a first sample to 
a second sample, wherein the first sample is of a colonic cancer patient and the 
second sample is of a normal human, wherein the transcript is identified by a 
tag selected from the group consisting of those shown in Table 2, wherein the 
first and second sample is a sample selected from the group consisting of 
blood, urine, feces, sputum, and serum; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be higher in the first sample than in the second sample. 



50. A method to aid in determining a prognosis of a patient having 
pancreatic cancer, comprising the steps of: 

comparing the level of at least one transcript in a first sample to 
a second sample, wherein the first sample is of a pancreatic cancer patient and 
the second sample is of a normal human, wherein said transcript is identified by 
a tag selected from the group consisting of those shown Table 4, wherein said 
first and second sample is a sample selected from the group consisting of 
blood, urine, feces, sputum, and serum; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be higher in the first sample than in the second sample. 



51. A method to aid in providing a prognosis for a cancer patient, 
comprising the steps of: 
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comparing the level of expression of at least one transcript in 
a first sample to a second sample, wherein the first sample is of a cancer patient 
and the second sample is of a normal human, wherein said transcript is 
identified by a tag selected from the group consisting of those shown Table 5, 
wherein the first and second sample is a sample selected from the group 
consisting of blood, urine, feces, sputum, and serum; 

determining a poorer prognosis if the level of the at least one 
transcript is found to be higher in the first sample than in the second sample. 

52. A method for screening for candidate agents that modulate the 
expression of a polynuleotide selected from the group consisting of the 
polynucleotides in SEQ ID NOS:l-732 or their respective complements, 
comprising contacting a test agent with a colon or pancreatic cell and 
monitoring expression of the polynucleotide, wherein the test agent which 
modifies the expression of the polynucleotide is a candidate agent. 
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